AITopics | Evans

Collaborating Authors

Evans

1e89c12621c0315373f20f0aeabe5dbe-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-9-2026, 04:01:29 GMT

Therearetwoupdatingstrategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom's taxonomy ofeducational objectives. Extensiveexperiments onupdated MMLU andBIG-Bench demonstrate thestability oftheproposed strategiesandfindthat the mimicking strategy can effectively alleviate issues of overestimation from benchmark leakage. In cases where the efficient mimicking strategy fails, our extending strategystill showspromising results.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > United States > Colorado > Weld County > Evans (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models

Neural Information Processing SystemsOct-9-2025, 20:29:27 GMT

arxiv preprint arxiv, cognitive level, dataset, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Mississippi (0.04)
Asia > Singapore (0.04)
North America > United States > Colorado > Weld County > Evans (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Education (0.88)
Information Technology (0.67)
Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.92)

Add feedback

Writing as a testbed for open ended agents

Gooding, Sian, Lopez-Rivilla, Lucia, Grefenstette, Edward

arXiv.org Artificial IntelligenceMar-25-2025

Open-ended tasks are particularly challenging for LLMs due to the vast solution space, demanding both expansive exploration and adaptable strategies, especially when success lacks a clear, objective definition. Writing, with its vast solution space and subjective evaluation criteria, provides a compelling testbed for studying such problems. In this paper, we investigate the potential of LLMs to act as collaborative co-writers, capable of suggesting and implementing text improvements autonomously. We analyse three prominent LLMs - Gemini 1.5 Pro, Claude 3.5 Sonnet, and GPT-4o - focusing on how their action diversity, human alignment, and iterative improvement capabilities impact overall performance. This work establishes a framework for benchmarking autonomous writing agents and, more broadly, highlights fundamental challenges and potential solutions for building systems capable of excelling in diverse open-ended domains.

gemini 1, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2503.19711

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Weld County > Evans (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

R$^2$: A LLM Based Novel-to-Screenplay Generation Framework with Causal Plot Graphs

Lin, Zefeng, Xiao, Yi, Mo, Zhiqiang, Zhang, Qifan, Wang, Jie, Chen, Jiayang, Zhang, Jiajing, Zhang, Hui, Liu, Zhengyi, Fang, Xianyong, Xu, Xiaohua

arXiv.org Artificial IntelligenceMar-19-2025

Published as a conference paper at ICLR 2025R 2: A LLM B ASED N OVEL-TO-S CREENPLAYG ENER-ATIONF RAMEWORK WITH C AUSALP LOT G RAPHS Zefeng Lin 1, Yi Xiao 1, Zhiqiang Mo 1, Qifan Zhang 1, Jie Wang 2, Jiayang Chen 2, Jiajing Zhang 2, Hui Zhang 1, Zhengyi Liu 3, Xianyong Fang 3, Xiaohua Xu 1 1 University of Science and Technology of China 2 Anhui Jianzhu University 3 Anhui University A BSTRACT Automatically adapting novels into screenplays is important for the TV, film, or opera industries to promote products with low costs. The strong performances of large language models (LLMs) in long-text generation call us to propose a LLM based framework Reader-Rewriter (R 2) for this task. However, there are two fundamental challenges here. First, the LLM hallucinations may cause inconsistent plot extraction and screenplay generation. Second, the causality-embedded plot lines should be effectively extracted for coherent rewriting. Therefore, two corresponding tactics are proposed: 1) A hallucination-aware refinement method (HAR) to iteratively discover and eliminate the affections of hallucinations; and 2) a causal plot-graph construction method (CPC) based on a greedy cycle-breaking algorithm to efficiently construct plot lines with event causalities. Recruiting those efficient techniques, R 2 utilizes two modules to mimic the human screenplay rewriting process: The Reader module adopts a sliding window and CPC to build the causal plot graphs, while the Rewriter module generates first the scene outlines based on the graphs and then the screenplays. HAR is integrated into both modules for accurate inferences of LLMs.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.15655

Country:

Asia > China (0.24)
North America > United States > Colorado > Weld County > Evans (0.04)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ACAI for SBOs: AI Co-creation for Advertising and Inspiration for Small Business Owners

Karnatak, Nimisha, Baranes, Adrien, Marchant, Rob, Butler, Triona, Olson, Kristen

arXiv.org Artificial IntelligenceMar-9-2025

Small business owners (SBOs) often lack the resources and design experience needed to produce high-quality advertisements. To address this, we developed ACAI (AI Co-Creation for Advertising and Inspiration), an GenAI-powered multimodal advertisement creation tool, and conducted a user study with 16 SBOs in London to explore their perceptions of and interactions with ACAI in advertisement creation. Our findings reveal that structured inputs enhance user agency and control while improving AI outputs by facilitating better brand alignment, enhancing AI transparency, and offering scaffolding that assists novice designers, such as SBOs, in formulating prompts. We also found that ACAI's multimodal interface bridges the design skill gap for SBOs with a clear advertisement vision, but who lack the design jargon necessary for effective prompting. Building on our findings, we propose three capabilities: contextual intelligence, adaptive interactions, and data management, with corresponding design recommendations to advance the co-creative attributes of AI-mediated design tools.

acai, computing machinery, participant, (13 more...)

arXiv.org Artificial Intelligence

2503.06729

Country:

North America > United States > New York > New York County > New York City (0.07)
North America > United States > Colorado > Weld County > Evans (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Marketing (1.00)
Information Technology (0.93)
Law > Intellectual Property & Technology Law (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(5 more...)

Add feedback

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Lu, Ximing, Sclar, Melanie, Hallinan, Skyler, Mireshghallah, Niloofar, Liu, Jiacheng, Han, Seungju, Ettinger, Allyson, Jiang, Liwei, Chandu, Khyathi, Dziri, Nouha, Choi, Yejin

arXiv.org Artificial IntelligenceJan-12-2025

Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions about whether AI can match or even surpass human creativity. We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text by reconstructing it from existing text snippets on the web. CREATIVITY INDEX is motivated by the hypothesis that the seemingly remarkable creativity of LLMs may be attributable in large part to the creativity of human-written texts on the web. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm that can search verbatim and near-verbatim matches of text snippets from a given document against the web. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs, and that alignment reduces the CREATIVITY INDEX of LLMs by an average of 30.1%. In addition, we find that distinguished authors like Hemingway exhibit measurably higher CREATIVITY INDEX compared to other human writers. Finally, we demonstrate that CREATIVITY INDEX can be used as a surprisingly effective criterion for zero-shot machine text detection, surpassing the strongest existing zero-shot system, DetectGPT, by a significant margin of 30.2%, and even outperforming the strongest supervised system, GhostBuster, in five out of six domains.

creativity, llm, reativity, (15 more...)

arXiv.org Artificial Intelligence

2410.04265

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Germany > Hamburg (0.04)
(12 more...)

Genre:

Overview (0.93)
Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.66)
Information Technology > Security & Privacy (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity

Qadri, Rida, Mirowski, Piotr, Gabriellan, Aroussiak, Mehr, Farbod, Gupta, Huma, Karimi, Pamela, Denton, Remi

arXiv.org Artificial IntelligenceDec-18-2024

This paper proposes dialogue as a method for evaluating generative AI tools for culturally-situated creative practice, that recognizes the socially situated nature of art. Drawing on sociologist Howard Becker's concept of Art Worlds, this method expands the scope of traditional AI and creativity evaluations beyond benchmarks, user studies with crowd-workers, or focus groups conducted with artists. Our method involves two mutually informed dialogues: 1) 'dialogues with art worlds' placing artists in conversation with experts such as art historians, curators, and archivists, and 2)'dialogues with the machine,' facilitated through structured artist- and critic-led experimentation with state-of-the-art generative AI tools. We demonstrate the value of this method through a case study with artists and experts steeped in non-western art worlds, specifically the Persian Gulf. We trace how these dialogues help create culturally rich and situated forms of evaluation for representational possibilities of generative AI that mimic the reception of generative artwork in the broader art ecosystem. Putting artists in conversation with commentators also allow artists to shift their use of the tools to respond to their cultural and creative context. Our study can provide generative AI researchers an understanding of the complex dynamics of technology, human creativity and the socio-politics of art worlds, to build more inclusive machines for diverse art worlds.

artist, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.14077

Country:

Indian Ocean > Arabian Gulf (0.25)
Asia > Middle East > Saudi Arabia > Arabian Gulf (0.25)
Asia > Middle East > Iran (0.05)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations

Worledge, Theodora, Hashimoto, Tatsunori, Guestrin, Carlos

arXiv.org Artificial IntelligenceNov-26-2024

Across all fields of academic study, experts cite their sources when sharing information. While large language models (LLMs) excel at synthesizing information, they do not provide reliable citation to sources, making it difficult to trace and verify the origins of the information they present. In contrast, search engines make sources readily accessible to users and place the burden of synthesizing information on the user. Through a survey, we find that users prefer search engines over LLMs for high-stakes queries, where concerns regarding information provenance outweigh the perceived utility of LLM responses. To examine the interplay between verifiability and utility of information-sharing tools, we introduce the extractive-abstractive spectrum, in which search engines and LLMs are extreme endpoints encapsulating multiple unexplored intermediate operating points. Search engines are extractive because they respond to queries with snippets of sources with links (citations) to the original webpages. LLMs are abstractive because they address queries with answers that synthesize and logically transform relevant information from training and in-context sources without reliable citation. We define five operating points that span the extractive-abstractive spectrum and conduct human evaluations on seven systems across four diverse query distributions that reflect real-world QA settings: web search, language simplification, multi-step reasoning, and medical advice. As outputs become more abstractive, we find that perceived utility improves by as much as 200%, while the proportion of properly cited sentences decreases by as much as 50% and users take up to 3 times as long to verify cited information. Our findings recommend distinct operating points for domain-specific LLM systems and our failure analysis informs approaches to high-utility LLM systems that empower users to verify information.

film director and screenwriter, krishnankoil venkadachalam mahadevan, otc cold and cough medicine, (13 more...)

arXiv.org Artificial Intelligence

2411.17375

Country:

Europe > Czechia (0.27)
Europe > United Kingdom > Scotland (0.14)
Europe > France (0.04)
(34 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal (1.00)
Research Report > New Finding (0.66)

Industry:

Media > Television (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(13 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps

Wang, Han, Zhao, Yilin, Li, Dian, Wang, Xiaohan, Liu, Gang, Lan, Xuguang, Wang, Hui

arXiv.org Artificial IntelligenceOct-14-2024

Humor is a culturally nuanced aspect of human language that presents challenges for understanding and generation, requiring participants to possess good creativity and strong associative thinking. Similar to reasoning tasks like solving math problems, humor generation requires continuous reflection and revision to foster creative thinking, rather than relying on a sudden flash of inspiration like Creative Leap-of-Thought (CLoT) paradigm. Although CLoT can realize the ability of remote association generation, this paradigm fails to generate humor content. Therefore, in this paper, we propose a systematic way of thinking about generating humor and based on it, we built Creative Leap of Structured Thought (CLoST) frame. First, a reward model is necessary achieve the purpose of being able to correct errors, since there is currently no expert model of humor and a usable rule to determine whether a piece of content is humorous. Judgement-oriented instructions are designed to improve the capability of a model, and we also propose an open-domain instruction evolutionary method to fully unleash the potential. Then, through reinforcement learning, the model learns to hone its rationales of the thought chain and refine the strategies it uses. Thus, it learns to recognize and correct its mistakes, and finally generate the most humorous and creative answer. These findings deepen our understanding of the creative capabilities of LLMs and provide ways to enhance LLMs' creative abilities for cross-domain innovative applications.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.1037

Country:

North America > United States > Colorado > Weld County > Evans (0.04)
Asia > China > Shandong Province > Qingdao (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Agents' Room: Narrative Generation through Multi-step Collaboration

Huot, Fantine, Amplayo, Reinald Kim, Palomaki, Jennimaria, Jakobovits, Alice Shoshana, Clark, Elizabeth, Lapata, Mirella

arXiv.org Artificial IntelligenceOct-3-2024

Writing compelling fiction is a multifaceted process combining elements such as crafting a plot, developing interesting characters, and using evocative language. While large language models (LLMs) show promise for story writing, they currently rely heavily on intricate prompting, which limits their use. We propose Agents' Room, a generation framework inspired by narrative theory, that decomposes narrative writing into subtasks tackled by specialized agents. To illustrate our method, we introduce Tell Me A Story, a high-quality dataset of complex writing prompts and human-written stories, and a novel evaluation framework designed specifically for assessing long narratives. We show that Agents' Room generates stories that are preferred by expert evaluators over those produced by baseline systems by leveraging collaboration and specialization to decompose the complex story writing task into tractable components. We provide extensive analysis with automated and human-based metrics of the generated output.

agent, computational linguistic, conference paper, (14 more...)

arXiv.org Artificial Intelligence

2410.02603

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Texas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(17 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Providers & Services (0.46)
Transportation > Passenger (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback